{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using Ragas to evaluate RAG pipelines\n",
    "\n",
    "In this notebook, we will showcase how to use Opik with Ragas for monitoring and evaluation of RAG (Retrieval-Augmented Generation) pipelines.\n",
    "\n",
    "There are two main ways to use Opik with Ragas:\n",
    "\n",
    "1. Using Ragas metrics to score traces\n",
    "2. Using the Ragas `evaluate` function to score a dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating an account on Comet.com\n",
    "\n",
    "[Comet](https://www.comet.com/site?from=llm&utm_source=opik&utm_medium=colab&utm_content=ragas&utm_campaign=opik) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm&utm_source=opik&utm_medium=colab&utm_content=ragas&utm_campaign=opik) and grab your API Key.\n",
    "\n",
    "> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/overview/?from=llm&utm_source=opik&utm_medium=colab&utm_content=ragas&utm_campaign=opik) for more information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install --quiet --upgrade opik ragas nltk openai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import opik\n",
    "\n",
    "opik.configure(use_local=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preparing our environment\n",
    "\n",
    "First, we will configure the OpenAI API key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import getpass\n",
    "\n",
    "if \"OPENAI_API_KEY\" not in os.environ:\n",
    "    os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Enter your OpenAI API key: \")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Integrating Opik with Ragas\n",
    "\n",
    "### Using Ragas metrics to score traces\n",
    "\n",
    "Ragas provides a set of metrics that can be used to evaluate the quality of a RAG pipeline, including but not limited to: `answer_relevancy`, `answer_similarity`, `answer_correctness`, `context_precision`, `context_recall`, `context_entity_recall`, `summarization_score`. You can find a full list of metrics in the [Ragas documentation](https://docs.ragas.io/en/latest/concepts/metrics/available_metrics/).\n",
    "\n",
    "These metrics can be computed on the fly and logged to traces or spans in Opik. For this example, we will start by creating a simple RAG pipeline and then scoring it using the `answer_relevancy` metric.\n",
    "\n",
    "#### Create the Ragas metric\n",
    "\n",
    "In order to use the Ragas metric without using the `evaluate` function, you need to initialize the metric with a `RunConfig` object and an LLM provider. For this example, we will use LangChain as the LLM provider with the Opik tracer enabled.\n",
    "\n",
    "We will first start by initializing the Ragas metric:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the metric\n",
    "from ragas.metrics import AnswerRelevancy\n",
    "\n",
    "# Import some additional dependencies\n",
    "from langchain_openai.chat_models import ChatOpenAI\n",
    "from langchain_openai.embeddings import OpenAIEmbeddings\n",
    "from ragas.llms import LangchainLLMWrapper\n",
    "from ragas.embeddings import LangchainEmbeddingsWrapper\n",
    "from opik.evaluation.metrics import RagasMetricWrapper\n",
    "\n",
    "# Initialize the Ragas metric\n",
    "llm = LangchainLLMWrapper(ChatOpenAI())\n",
    "emb = LangchainEmbeddingsWrapper(OpenAIEmbeddings())\n",
    "\n",
    "ragas_answer_relevancy = AnswerRelevancy(llm=llm, embeddings=emb)\n",
    "\n",
    "# Wrap the Ragas metric with RagasMetricWrapper for Opik integration\n",
    "answer_relevancy_metric = RagasMetricWrapper(\n",
    "    ragas_answer_relevancy,\n",
    "    track=True,  # This enables automatic tracing in Opik\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once the metric wrapper is set up, you can use it to score a sample question. The `RagasMetricWrapper` handles all the complexity of async execution and Opik integration automatically."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# For Jupyter notebook compatibility\n",
    "# This is needed for async operations in Jupyter notebooks\n",
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ[\"OPIK_PROJECT_NAME\"] = \"ragas-integration\"\n",
    "\n",
    "# Score a simple example using the RagasMetricWrapper\n",
    "score_result = answer_relevancy_metric.score(\n",
    "    user_input=\"What is the capital of France?\",\n",
    "    response=\"Paris\",\n",
    "    retrieved_contexts=[\"Paris is the capital of France.\", \"Paris is in France.\"],\n",
    ")\n",
    "\n",
    "print(f\"Answer Relevancy score: {score_result.value}\")\n",
    "print(f\"Metric name: {score_result.name}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you now navigate to Opik, you will be able to see that a new trace has been created in the `Default Project` project.\n",
    "\n",
    "#### Score traces\n",
    "\n",
    "You can score traces by using the `update_current_trace` function.\n",
    "\n",
    "The advantage of this approach is that the scoring span is added to the trace allowing for a more fine-grained analysis of the RAG pipeline. It will however run the Ragas metric calculation synchronously and so might not be suitable for production use-cases."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from opik import track, opik_context\n",
    "\n",
    "\n",
    "@track\n",
    "def retrieve_contexts(question):\n",
    "    # Define the retrieval function, in this case we will hard code the contexts\n",
    "    return [\"Paris is the capital of France.\", \"Paris is in France.\"]\n",
    "\n",
    "\n",
    "@track\n",
    "def answer_question(question, contexts):\n",
    "    # Define the answer function, in this case we will hard code the answer\n",
    "    return \"Paris\"\n",
    "\n",
    "\n",
    "@track\n",
    "def rag_pipeline(question):\n",
    "    # Define the pipeline\n",
    "    contexts = retrieve_contexts(question)\n",
    "    answer = answer_question(question, contexts)\n",
    "\n",
    "    # Score the pipeline using the RagasMetricWrapper\n",
    "    score_result = answer_relevancy_metric.score(\n",
    "        user_input=question, response=answer, retrieved_contexts=contexts\n",
    "    )\n",
    "\n",
    "    # Add the score to the current trace\n",
    "    opik_context.update_current_trace(\n",
    "        feedback_scores=[{\"name\": score_result.name, \"value\": score_result.value}]\n",
    "    )\n",
    "\n",
    "    return answer\n",
    "\n",
    "\n",
    "rag_pipeline(\"What is the capital of France?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Evaluating datasets using the Opik `evaluate` function\n",
    "\n",
    "You can use Ragas metrics with the Opik `evaluate` function. This will compute the metrics on all the rows of the dataset and return a summary of the results.\n",
    "\n",
    "The `RagasMetricWrapper` can be used directly with the Opik `evaluate` function - no additional wrapper code is needed!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "import opik\n",
    "\n",
    "\n",
    "opik_client = opik.Opik()\n",
    "\n",
    "# Create a small dataset\n",
    "fiqa_eval = load_dataset(\"explodinggradients/fiqa\", \"ragas_eval\")\n",
    "\n",
    "# Reformat the dataset to match the schema expected by the Ragas evaluate function\n",
    "hf_dataset = fiqa_eval[\"baseline\"].select(range(3))\n",
    "dataset_items = hf_dataset.map(\n",
    "    lambda x: {\n",
    "        \"user_input\": x[\"question\"],\n",
    "        \"reference\": x[\"ground_truths\"][0],\n",
    "        \"retrieved_contexts\": x[\"contexts\"],\n",
    "    }\n",
    ")\n",
    "dataset = opik_client.get_or_create_dataset(\"ragas-demo-dataset\")\n",
    "dataset.insert(dataset_items)\n",
    "\n",
    "\n",
    "# Create an evaluation task\n",
    "def evaluation_task(x):\n",
    "    return {\n",
    "        \"user_input\": x[\"question\"],\n",
    "        \"response\": x[\"answer\"],\n",
    "        \"retrieved_contexts\": x[\"contexts\"],\n",
    "    }\n",
    "\n",
    "\n",
    "# Use the RagasMetricWrapper directly - no need for custom wrapper!\n",
    "opik.evaluation.evaluate(\n",
    "    dataset,\n",
    "    evaluation_task,\n",
    "    scoring_metrics=[answer_relevancy_metric],\n",
    "    task_threads=1,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Evaluating datasets using the Ragas `evaluate` function\n",
    "\n",
    "If you looking at evaluating a dataset, you can use the Ragas `evaluate` function. When using this function, the Ragas library will compute the metrics on all the rows of the dataset and return a summary of the results.\n",
    "\n",
    "You can use the `OpikTracer` callback to log the results of the evaluation to the Opik platform:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "from opik.integrations.langchain import OpikTracer\n",
    "from ragas.metrics import context_precision, answer_relevancy, faithfulness\n",
    "from ragas import evaluate\n",
    "\n",
    "fiqa_eval = load_dataset(\"explodinggradients/fiqa\", \"ragas_eval\")\n",
    "\n",
    "# Reformat the dataset to match the schema expected by the Ragas evaluate function\n",
    "dataset = fiqa_eval[\"baseline\"].select(range(3))\n",
    "\n",
    "dataset = dataset.map(\n",
    "    lambda x: {\n",
    "        \"user_input\": x[\"question\"],\n",
    "        \"reference\": x[\"ground_truths\"][0],\n",
    "        \"retrieved_contexts\": x[\"contexts\"],\n",
    "    }\n",
    ")\n",
    "\n",
    "opik_tracer_eval = OpikTracer(tags=[\"ragas_eval\"], metadata={\"evaluation_run\": True})\n",
    "\n",
    "result = evaluate(\n",
    "    dataset,\n",
    "    metrics=[context_precision, faithfulness, answer_relevancy],\n",
    "    callbacks=[opik_tracer_eval],\n",
    ")\n",
    "\n",
    "print(result)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
